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A discrete-time method for solving problems in optimal quantum control is presented. Controlling 
the time discretized Markovian dynamics of a quantum system can be reduced to a Markov-decision 
process. We demonstrate this method in this with a class of simple one qubit systems, which are 
also discretized in space. For the task of state preparation we solve the examples both numerically 
and analytically with dynamic programming techniques. 
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I. INTRODUCTION 



Quantum control is a discipline that will see its importance match any of the technologies that it enables - quantum 
information sciences, to take a prominent example. 

But, the problems which arise in Quantum control can be very difficult to solve. This paper explores a possible 
direction towards formulating and solving problems from optimal quantum control by taking inspiration from quantum 
computing. 

, 1 * ' The control of quantum systems has been studied for decades, see e.g. Because real systems and measurements 
are often continuous [lrjj , the usual technique in optimal quantum control is to study the stochastic master equation 
^ ■ for the system being controlled. A sizable body of literature exists which details the control of continuous time 
quantum systems @. 

Some systems are inherently time-discrete like the Stern-Gerlach experiment or a quantum circuit, but any time- 
continuous system must be discretized if it is to be simulated on a quantum computer Q. These time-discrete 
CNl . simulations are what we seek to control, in lieu of the continuous model of the system. 

We model the system by slicing time into steps where the evolution on each step is given by a quantum circuit 
(including nonunitary quantum operations and measurements). Each timestep's circuit effects a Markovian step on 
the system's state. Since we want to control a Markov chain we turn to the theory of Markov decision processes 
(MDPs) [HI , specifically to the field of control theory called Discrete-Time Optimal Stochastic Control [|| . 

A MDP consists of two parts, the agent and the system. The agent consists of the experimenter and a (classical) 
computer system, the controller. The system is an open quantum system or a model thereof, interacting through its 
. environment, from which a measurement apparatus takes measurements which are reported to the agent. The agent 
t-H ' in turn can vary the Hamiltonian (apply controls) which acts on the system to some specified degree. The closed loop 
process of taking measurements and applying controls is repeated for either a certain amount of time or indefinitely, 
. — ■ depending on the goal the experimenter has in mind. The controller is programmed to use the information from the 
measurements along with the knowledge of the system dynamics to apply the appropriate controls in order to best 
further the goals of the experimenter. In this paper we discuss methods for finding the best program, on average, 
that the controller should use. 

The MDPs that arise in this setting have a continuous state space and transition matrices which arise from the 
dynamics of the system which give them more structure than most MDPs that are studied. It is this structure that 
gives us more power to solve these systems, for example, we can solve some MDPs analytically. 

We demonstrate solving a system by using two sample models, the affine-quadratic and the threshold models, 
so-named for the cost functions that determine their behavior. These models have a simple enough state space (the 
interval [0, 1]) that there is little difficulty in discretizing that space and solving the Bellman equations that arise 
numerically. 

The discrete technique presented in this paper is also applicable to more complicated models. A subsequent paper 
will study open systems undergoing decoherence and bipartite systems undergoing disentanglement. 

The two example models are chosen to be as elementary as possible, the system being that of a qubit undergoing 
a stochastic evolution in which the state remains pure. There is a desired target state that must be reached after a 
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fixed amount of time, but the target state is unstable. 

We solve the models first numerically and note peculiar behavior of the solutions, such as the as-if piecewise linear 
shape of the decision function graph, and corresponding parabolic shape of the value functions, which do not necessarily 
have a minimum at the target state. This means it is not good to stay near the target state if it is unstable and 
the end of the experiment is far off. As the value functions are propagated backwards in time as is done in dynamic 
programming fast convergence is evident for the value functions tending to a steady-state solution. This means that 
for a very long experiment relative to the probability of decaying away from the target state, the controller takes little 
action until the latter part of the experiment. 

Our parameterization of the measurement operators allows us to study the best measurements to make in order to 
control the system as efficiently as possible. Interestingly we find that having numerous (more than two) measurement 
operators can make the system easier to control, in the sense that the value functions are smaller compared to a system 
with only two measurement outcomes. 

These behaviors of the affine-quadratic model that are observed from numerical simulations can be explained by 
the analytic solution of the Bellman equation. Due to the simplicity of the system the Bellman equation can be 
approximately solved with known error bounds. 

A. The state preparation problem 

Algorithms in quantum computation often require an initial state to be prepared. The state preparation module 
would fill this need by preparing the desired state \4>t) (as closely as possible) at the correct time by starting with an 
arbitrary unprepared state fi(il |. Oftentimes one can perform a projective measurement in which one of the outcomes 
is the desired state to be prepared. This however, may not happen often enough, fast enough Q, or there may not 
be the physical resources available to make the correct state with a single measurement. 

In our two models the system begins in a known, but arbitrary pure state IV'o)- After N timesteps, the final state 
\iPn) is evaluated against the target state \^t) (there can be more than one target state in general). For the cost 
function formulation of the finite horizon MDP (see below), the success of the state preparation process of measured 
by the terminal cost function J^. Clearly should take its minima on the target states. 

For the two models described in this paper, there is a single target state \i/jt) = |0). The first model we deal with, 
the affine model, we choose 

J^QM) = Ard^iv), |<M) = 1 - (^nHt) 2 , (1) 

where D tI is the trace distance. Suppose a quantum algorithm expects \i/jt) as an input but gets \iptf) instead. 
Then that algorithm performs worst when J^ aff is greatest, in the following sense. Suppose the quantum algorithm 
is represented by a unitary operation W applied to \ipr), followed by a POVM with operators {IT}i. Outcome i 
is observed with probability trIliW ji \'ipT}{tpT\W . This provides the probability distribution {trn i W /1 '|'0T)(?/'T|W / }i- 
Taking the I 1 (classical trace) distance between that distribution and the one obtained by using \tpN) as the input 
state, {trILiW*\i/}N){i>N\W}i, we can get a reasonable measure of success. Since we are not specifying the particular 
algorithm we do not know the POVM that will be applied. However the quantum trace distance Dtr(IV'iv), IV't)) 
provides an upper bound for the I 1 distance for any POVMpjJ. Since we want the measure of success to be smaller 
when the probability distributions are similar, we get equation ([TJ. Equation ([IJ varies with the overlap (squared 
fidelity) of \iPn) and |"0t), the probability of observing one when making a measurement looking for the other. 

Open quantum systems can be approximated so that their environment "forgets" its interaction with the system 
very quickly (although in fact our simple model is closed) . Thus Markov processes are used to model quantum systems 
and in our discrete case these will be Markov chains. 



B. Markov decision processes 

A Markov decision process is a controllable Markov process. We will present MDPs in a slightly different way than 
is usual, to make it easier to adapt the quantum model to the formalism of the MDP. All Markov processes in this 
paper will be Markov chains, such that the time set / = {0, N — 1} is finite. A MDP is a set of states S together 
with sets of actions A(s,i) C S that can be taken at each state s £ S and time i £ I and transition probabilities 
P| )S '. If the current state at time i is s then Ps, s > is the probability that state s £ A(s,i) transitions to state s' in the 
next timestep where the action s is an intermediary state that is chosen by the agent out of the set A(s, i). The agent 
chooses the action by following a policy, a function it : S x I —> S such that ir(s, i) £ A(s, i). (There are formalizations 
of the policy function which enable the agent to act probabilistically and in a non-Markovian way but this level of 
generality is not needed in this paper [13] •) 



3 



The model allows only one way to take the state s to s at time i and it is assigned a control cost gi(s,s). An 
alternate formulation of MDPs uses rewards (negative of costs) associated to the state resulting at the next timestep. 
If <?j(s, s) is independent of time the subscript i is dropped. 

In summary, a single timestep proceeds as follows: 



H+i 



(2) 



where uji is a random outcome choosing s' ; = s^+i with probability P$ iS '.. 

Markov decision processes can be run for a definite or indefinite amount of time. With the state preparation 
problem, the experiment runs for a fixed amount of time (N timesteps). This is called the finite-horizon problem. 
Experiments which run for an indefinite amount of time, which can stop at a time unknown to the experimenter or 
run for an infinite amount of time (/ = N) are interesting and arise in Quantum control but the theory of modeling 
such systems with MDPs will not be discussed in this paper. 

For a finite-horizon problem, the state of the system at the end of the experiment sn is of importance. The terminal 
cost function J^(sat) evaluates sjy. If the experiment starts with initial state sq, with transition outcomes u> and the 
controller follows the policy ir then the total cost is 

N-l 

G(s 0> n,u) = ^2 g(si,Si) + JJv(siv). (3) 

where the Si are obtained from the procedure ©. Let 

JZ( So ) = E[G( SQi tt,lo)1 (4) 

the expected total cost of the experiment, following policy tt with initial state sq. For a fixed so we would like to find 
an optimal policy tt* that minimizes Jq{sq). In that case we write Jg(so) = Jq (sq). 

The Dynamic programming algorithm 0, 0, HH is a well-known technique for finding the optimal policies ir* . To 
state it, we define the optimal cost-to-go at time k, 



'N-l 



^2 g(siJi) + Jn( s n) 



i—k 



(5) 



where, as before, the Si are obtained from the process ((2j). The optimal cost-to-go tells us the minimum remaining 
total cost when starting in state s at time i. Under suitable conditions Q, 

J*(s) = U[J* +1 (s)} - inf {E [g(s, §) + J* +1 (s')] } , (6) 

for < i < N. The dynamic programming algorithm (jSJ) states that the optimal cost-to-go functions can be computed 
by iterating backwards in time. 

We will be concerned with cases when the infimum in equation ([6]) is achieved and so (an) optimal policy 7r*(s, i) — s 
is obtained from taking an argmin in equation ([B]). In the next section we will apply this framework to a general 
discretized physical system. 



C. The state preparation problem as a Markov decision process 



By truncating the Baker-Campbell-Hausdorff (BCH) formula or using the Trotter formula 0, [l2|, [H| on the 
system dynamics of a finite-dimensional open quantum system, a discrete time approximation can be obtained for the 
system: 



Controllable U r 



— > 


Free Uf 


— > 


Meas. 


— > 


Noise 



Si+l- 



(7) 



The circuit represents a single timeslice of system evolution. It is applied repeatedly to the starting state sq. The 
ordering of the blocks in ([7]) becomes less relevant as the length of the timeslice At — > 0. Questions arise pertaining 
to the behavior of the discrete model in the continuous time limit. These issues are discussed in [f|. 

The structure of the circuit mirrors our formulation of a MDP. The state Sj enters from the left. The U c block is 
the part where the agent takes an action, choosing a new state Si € A(s, i). The action is the state resulting from a 
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unitary operation applied to s chosen from a set of control Hamiltonians Hi that the agent has available at time i. 
Specifically, A(s,i) = {e' HAt s | H e Hi}. 

If A(s, i) does not depend on s and i then we can assume that any desired action is in A(s, i), and disallow physically 
unrealistic or undesirable actions at a given state by making the cost prohibitive. In that case (and in the examples) 
we effectively set A(s,i) = S. If the action is s, the cost paid at timestep i is g(s,s). The cost function will be 
explained in a later section. If there are multiple Hamiltonians in Hi which result in the same action we choose the 
best one under the same physical constraints that the cost function was chosen (in order to make the cost function 
well-defined) . 

The free evolution unitary Uf is the next step. This is coherent evolution of the system that cannot be controlled 
directly by the controller. This step can be folded into the previous step by composing U f with the action. 

The next two steps, measurement and noise can also be composed together into a single non-trace preserving 
quantum operation that sends the state s to one of the states {R a {s)} a &n 5= $ with probability distribution Pr(a|s) = 
Pg,R a ( s ') an( i se t °f measurement outcomes f2. In the finite dimensional system we consider, Q will be a finite set. 
Then the Dynamic programming algorithm (|6|) becomes 

J*(s)= min (/ g( S ,s) + J* +1 (R a (s))dPv(a\s)\, (8) 
seA(s, t ) {J n J 

A few properties of the Bellman iteration IA[-] from ((6J that are useful for analyzing the examples will be summarized. 

Proposition 1. Let S be compact and £1 be finite. Then U[-] is a map on C(S). That is, U[J] is continuous if J is. 
Additionally, 

I. If g(s, s) > for all s, s G S with equality for s — s then the constant function J(s) = C is a fixed point for IA[-] 
for all C . 

II. \\IA[J] — Z^IVJUoo < || J — VWco In particular, U[-\ is a continuous map. 
Proof. Item HI1 Fix s. Then 

\U[J](s) -U[V](s)\ = | min{«?( S , s) + V J(R a (s)) Pr(a\s)} 

S £ — * 

- mm{g{s, s) + V V(R a {§)) Pr(a|S)}| 

S * — ' 

<max| V(J(J? Q (s)) -V(R a (s))) Pr(a|s)| 

S £ — * 

< max | J(z) - V(z)\ VPr(a|s) 

Z — ' 

=V-y\W 

□ 

We expect proposition [T] is well-known in the field of discrete-time stochastic control but are unable to find sources. 
We now turn our attention to the specific models. 



II. OPTIMAL CONTROL PROBLEMS FOR CYLINDRICALLY SYMMETRIC QUBIT STATES 

We will demonstrate this technique with two single qubit models. In order to make the optimization procedure 
needed to solve the Bellman equation straightforward, we choose one of the simplest one-qubit systems, one that can 
be parameterized by a single real number. One way to satisfy this constraint is to confine an arbitrary state s in the 
system S to half of a great circle on the surface of the Bloch sphere B. We choose a basis {|0), |I)} so that if s x € S, 
then 



s x = Vr^|0) + Vi|l), (9) 

for x £ [0, 1]. We will hereafter abuse the notation by identifying the system S with its parameterization [0, 1] and 
state s x with its parameter x. When an additional parameter <f> is introduced to represent relative phase the entire 
Bloch sphere is swept out by rotating the set S about the z-axis (the axis containing |0) and |1)) as is varied. We 
may therefore consider any state s £ S to be the equivalence class {TZ z (<p)s} (TZ Z is the rotation about the z-axis 
operator) making the set of equivalence classes S a cylindrically symmetric version of B. 
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There are several reasons why our cylindrically-symmetric S is not an unrealistic choice. If the experiment requires 
the preparation of a unpopulated state |0) without any consideration to the relative phase, then the terminal cost 
function will be cylindrically symmetric. If the control step is strobing an appropriately aligned magnetic field, (for 
example, one with no a x or a z in the Hamiltonian) then the control cost function g is cylindrically symmetric. In fact, 
there are several families of physically-motivated operators which preserve S and hence are cylindrically symmetric. 



A. Operators on S 

We can think of a noisy quantum operation £ : s H ► s' being applied to a system p s = \ip s )(ip s \ as the system 
interacting with its environment p c . This can be modeled by an entangling unitary operator U sc acting on the 
environment-system p sc followed by a measurement M e on the environment. When the record of A4 e is ignored the 
resultant system state p s > = tr c (^ ■ MjU se p s U se ' M^) is left as a mixed state meaning that in general s' $ S. Now 
suppose the experimenter is allowed to monitor the environment (for example, the environment is an electromagnetic 
channel monitored with a photodetector [l3j]) and know which Mj occurred. Because in that case s' € S we will 
use the latter scenario in our model to derive some physically motivated operations on S. As explained below some 
important quantum operations (amplitude damping, phase damping, etc.) have Kraus operators which preserve S 
and these operators can serve as measurement operators for our system. 

Considering the set of states as column vectors, {IV^) = ^ ) I s x G S} is preserved by the real nonzero 

nonnegative-entry 2x2 matrices M2(K + ) \ {0}, up to a normalization factor. 
A finite subset {Kj} C M2(R + ) satisfying the completeness relation 

J2 K Mi = h, (io) 

3 

can represent the operators for either a complete measurement or a trace-preserving quantum operation (l2j (we think 
of it as the former). It is then a simple calculation to check that the Kj belong to either or both the filtering class 
(rank 2) 

r = {( a o g),(g 8)I«M<>} 

or the jump class (rank 1) 

J = {(° %),(% 8)>- 

The operators in T arise from the Bayesian updating of the state conditioned on the result of the measurement while 
the operators in J are direct jumps (e.g. wave function collapse) to a fixed state. 
An operator K e T U J is also considered a map on S, 

K : s s' 

such that 



Furthermore if K is taken to be a measurement operator then Pr(i^|s) = ||X|^)|| 2 for all s E S. Table U describes 
the measurement operators in that sense. 

Since the discrete systems in this paper are approximations of real continuous systems, in a future paper we 
investigate such operators in the infinitessimal time limit-the filtering operators go smoothly to the identity operator 
while the jump operators remain fixed jumps (although the probabilities of occurrence go to zero). 

Some examples are [l2[ amplitude damping {/i(l, 1— 7), Ji(7, 0)}, phase damping {/i(l, 1— 7), ji(0, 7)}, generalized 
amplitude damping {fi(p,p(l-j)) , /i((l-f>)(l-7)), ji(pj,0), j 2 (0, (l-p)-y)} and the bit-flip channel {/i(p,p), /a(l- 
P, 1 -p)}- 



B. Filtering and Jump operators 

When Mi £ J for all < % < m, the outcomes are in m discrete states, call them \ipi), < i < m, possibly 
nondistinct. There may be interesting problems when the set of the \ipi) are nonorthogonal, but we will not consider 
jump-only measurements further in this paper. 
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TABLE I: Measurement operators 

Name Operator K s' Pr(K\s) 
M a >V (f^) (Frfe (b-a)s + a 
Ma,b) (£f) (fe-a) S + a 

*>M) (^o) TiTb (a + b)(l-s) 



The measurement sets M. that we will use will contain at least one filtering operator and zero or more jump 
measurements so that M. contains two or more measurement operators. Experience suggests that having more than 
two nonorthogonal measurement operators can make the system easier to control. 



III. COST FUNCTIONS FOR A SIMPLE MODEL 



The two examples that we discuss in this paper differ only by their cost functions. For the first, the affine model, 
the terminal cost function is equation (JT|), which, when recast in the x-parameterization is 

J'n S {x n ) = \x-xt\ =x, (11) 

where the target state xt = if st = |0). Other functions which have their global minima at the target states could 
be justified. One might decide that a resultant state more than 0.2 units from the target state in S is unacceptable. 
Then we might have a Jjy like 

jtf***r x)= JI*-*H i£\x-x T \<0.2 

The number 100 was chosen to be much larger than typical values of the cost-to-go functions (essentially infinity) but 
not large enough to cause overflows in computation. 

With our discrete time formulation of the control problem, we do not directly deal with differential equations and 
so nonsmooth functions like J^ hrcsh can be incorporated naturally into the model. 




A. Cost-to-move function 



In addition to the terminal cost there are resources that are used by the controller at every timestep to drive the 
system. The use of these resources is quantified by the cost-to-move function. For simplicity, as mentioned in section 
II C[ the cost-to-move function g(s, s) will only depend in the current state s and the state s that the controller will 
move the system to in the next timestep by using one of its Hamiltonians. 

Part of what makes quantum control hard is that the controller can not move the system to any state that it 
chooses in a single step. Otherwise, the step preparation problem would be trivial. The transitions s i— > s which are 
inadmissible (disallowed) in the model are due to physical considerations. The Hamiltonians that the controller has 
available, the size of the timestep and the way the system is discretized determine which pairs (s, s) are inadmissible. 
In that case, we assign 

g(s,s) = oo 

for an inadmissible pair (s, s). As in equation (|12p . for computational purposes oo is taken to be a large number. The 
function J?' is the cost-to-go if the next action is s and then an optimal policy is followed thereafter. Optimizing 
J?'* over s is the same as the optimization step in the dynamic programming algorithm (DPA) equation ^ and so it 
often shows up in implementations of the DPA. The advantage in making all pairs in S x S admissible and redefining 
infinity to a large number is that it makes the domain of Jf'*(x) (in variables (x,y)) rectangular and finite on the 
entire domain, easing the optimization process. 

As for reasonable cost functions, the most straightforward metric, for states on the Bloch sphere B, might be the 
angular distance. Two states x,y € S have angular distance 



d s (x, y) = arccos (l - 2y - 2x + Axy + 4y/x(l - y)y(l - yj) 
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One might imagine that in the U c control step above the controller has some Hamiltonians that can be switched 
on and off for a percentage of the time afforded to the controller. Then perhaps all states that are a certain distance 
away in the dg metric can be reached, for a similar use of resources. We might have then 

thresh( } = [0, 0.2 

y v ,yj [100, if d e (x,y) > 0.2. v ' 

On a similar vein to the terminal cost, 100 is an arbitrary 'big' number so that if Jq(x) > 100 then there is no way 
to control the system from x to xt in the model. 

Other possible cost functions are dg or dg. Here we imagine that applying the Hamiltonians in the control step U c 
cost energy which is proportional to how far in the dg metric that the state has moved [ll[ . In this case we would like 
to balance the usage of energy to the closeness of the resultant state at the end of the experiment to the target state. 
In this situation, having infinite energy (g(s, s) — 0) might allow the controller to make very strong pulses toward the 
end of the experiment. Whether or not that is an optimal control solution depends on the probability of states near 
the target state decaying away before the end of the experiment. In Q the authors faced a similar problem. 

In order to make the system solvable analytically, we will use the second order Taylor polynomial of dg expanded 
about (^, ^), giving a quadratic symmetric in its arguments. It is 

g aH (x,y)=A(x-y) 2 . (14) 

Using the function C(s, s) — \s — s\ would be even simpler but leads to trivial controls when equation is solved 
for the example model in the next section. 



IV. SOLVING THE DPA FOR THE MODELS 



Putting the affine terminal cost (fTT|) and the quadratic movement cost (|14ll into equation (j6|) we get 

J U{X ) = nun | 4( , -yf + £ J* (M±|) ( Cj y + dj ) J , (15) 

with constants dj, bj, Cj and dj which depend on the measurement set chosen from table HI We will also solve equation 
© numerically using the threshold cost functions (TT21 and (TT^|) . 

In order to solve equations (1151) numerically, we must specify the measurement set to be used. Numerous examples 
that we have experimented with have the same basic behavior, which is typified by the system with measurement 
operators {/i(a, 1), ^(0, 1 — a)}. With some rescaling equation (fTS*)) becomes 

U[J]{x) = o min Ux -yf + jf — * - ((1 - a )y + a) + J(l)(l - ((1 - a)y + a)) X . (16) 

The parameter < a < 1 determines how unstable the target state is. If the system is a two-level atom then it is the 
rate of decay. We can vary a in order to study how the optimal policies change with the instability of the system. 



A. Solving numerically 

A simple way to proceed numerically is to discretize S into say, 100 equally spaced points. For the resolution 
desired this naive assumption turns out to be acceptable for reasons that will become clear in the analytic solution 
section. The functions J* can be approximated as a discrete list of values and computed recursively by minimizing 
over a discrete list. Figure ITV Ah . shows a graph of J* for i = 0, 1, . . . , 5 when a — 0.8. Similar graphs are obtained 
for < a < 1 and for systems with other sets of measurement operators. Figure II V Ab . shows the optimal policies 
TT*(x,i)) for the affine problem. From the picture, the n*(x,i) appear to be piecewise affine. The analytic solution 
will tell us that it is not the case, although the n*(x,i) are closely approximated by piecewise affine functions. 

When we use the threshold functions, a typical run might look like figure IIVAT b.,d.). The cost-to-go functions are 
quite different for the two models. One can easily see how the cutoff in thresh propagates through the J*. We 
expect an analytic analysis of this model to be fairly easy. Another interesting feature which is shared with the latter 
model is that the local minima for the various value functions do not lie at the target state. This is due to the large 
probability of a state near the target state decaying away before the end of the experiment. 
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B. Approximating the solution analytically 



We can explain some of the behaviors that arose with the affine/quadratic model (fl6|) . The goal is to show that if 
cost-to-go J is quadratic then Li [J] is approximately quadratic. Appealing to the continuity of the iteration map, we 
can obtain reasonable approximations to W [J] for small values of j. Due to constant-function hxed points of Li[-], for 
larger values of j, W[J\ becomes uninteresting. 

Suppose J(x) — Ax 2 + Bx + C. Considering the sum in equation (fT5j). we get linear terms from jump- type 
measurements and for each filtering-type measurement operator we get a term like J( °^^ )(cy + d). Evaluating this, 
we get a quadratic term and a fractional term N(y) given by 



N(y) 



A(ad - be) 2 



c 2 (d + cy) 

The second order Lagrange interpolating polynomial agreeing with N(y) at the points yi = 0, h, 1 is 

2c 2 



(17) 



. , A(ad - be) 2 ( 1 c(3c + 2d) 

L (y) = 3 ( - - , ; K^ v 



with error bounds 0] 



d d{c + d)(c + 2d) d(c + d)(c + 2d) 



\N(y)-L(y)\< ^^^ \y { y-l)(y-^ 



y 



The maximum error is bounded by + )"i2V3 ' ^ or an °P era t° r /i( a i b) or /2(a, b), the bound becomes 

Aa 2 (b-a) 



b 2 12^3 

For example, if J was an affine function (A = 0) then the approximation is exact. 

Let the approximate to Li[J] be called U[J\. Computing Li[J](x) amounts to finding the minimum of a quadratic 
parameterized by a; on a closed interval. Li [J] is thus a piecewise quadratic function. Likewise to approximate Li 2 [J] 
we compute W[W[J]] and get a bound on the error. We do not find the occasion to repeat this procedure to find, 
for example an approximation to Jg due to the algebra involved and the efficacy of the numerical method. The 
analytic solution explains why the graphs of the tt* look piecewise linear - because the graphs of the J* look piecewise 
quadratic. 




FIG. 1: Example costs-to-go and optimal policies. 
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V. CONCLUSION 

We have demonstrated a method for controlling a quantum system by controlling its simulation. We have shown 
that the method works by applying it to two simple models. The method has been applied to more complicated 
models: a qubit undergoing decoherence, which has a more complicated state space, and two qubits undergoing 
disentanglement. These situations will appear in a future paper. 

We need to resolve issues with the discretization. How small does the timeslice have to be to get 'good' control? 
How does the MDP turn into a stochastic differential equation as the timeslice goes to 0? 

We are concerned with how the control problem scales with the size of the system. When searching the state space 
in the optimization step we rely on good parameterizations of the system. These become harder to come by as the 
size of the system increases. 
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